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Abstract 

^C) ', We introduce the concept of average best m-term approximation widths with 

respect to a probability measure on the unit ball or the unit sphere of £™ . We 
estimate these quantities for the embedding id : £™ — > £™ with < p < q < oo 
for the normalized cone and surface measure. Furthermore, we consider certain 
tensor product weights and show that a typical vector with respect to such 
43 ' a measure exhibits a strong compressible (i.e. nearly sparse) structure. This 

measure may be therefore used as a random model for sparse signals. 
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1 Introduction 

1.1 Best m-term approximation 



Let m G No and let S m be the set of all sequences x = {xj}JL l with 
IMIo := #supp x = #{n £ N : i„ / 0} < m, 



Here stands j^A for the number of elements of a set A. The elements of S m are 
said to be m-sparse. Observe, that E m is a non-linear subset of every l q := {x = 
{ x j}JL\ '■ \\ x \\g < °°}> where 




< q < oo, 
q = oo. 

For every x £ l q , we define its best m-term approximation error by 

o- m {x) q := inf \\x - y\\ q . 
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Moreover for < p < q < oo, we introduce the best m-term approximation widths 

a™ := sup a m {x) q . 

x:\\x\\ p <l 

The use of this concept goes back to Schmidt [H] and after the work of Oskolkov 
[39] . it was widely used in the approximation theory, cf. [151 EB1 SS]- In fact, it is 
the main prototype of nonlinear approximation |17j . It is well known, that 

2 - 1 /p( m + 1 )i/9-i/p< CT M< ( m + 1 )i/a~i/P 5 m = 0,1,2,.... (1) 

The proof of (JlJ is based on the simple fact, that (roughly speaking) the best m-term 
approximation error of x G i p is realized by subtracting the m largest coefficients 
taken in absolute value. Hence, 



£f =m+1 ^ 0<g<oo, 

^ x m+l = su Pjf>m+l x ji q = OO 



where x* = [x\, x\, ■ ■ ■ ) denotes the so-called non-increasing rearrangement [B] of 
the vector (\x\\, \x2\, \xs\, . . . ). 

Let us recall the proof of ((TJ) in the simplest case, namely q = oo. The estimate 
from above then follows by 



/ m + 1 \ i/p 

a m {x) 00 = sup x*j=x* m+1 < (m + l)" 1 < (m + l)" 1 /? (2 



The lower estimate is supplied by taking 

m+l 

x = (m + l)-VP j] eij (3) 
i=i 

where {ej}?^]^ are the canonical unit vectors. 

For general q, the estimate from above in ([TJ may be obtained from ([2]) and 
Holder's inequality 

IMIa < ■ IMIocT 9 ) where - = -. (4) 
p Q P 

The estimate from below follows for all q's by simple modification of ©. 
The discussion above exhibits two effects. 

(i) Best m-term approximation works particularly well, when 1/p— 1/q is large, 
i.e. if p < 1 and q = oo. 

(ii) The elements used in the estimate from below (and hence the elements, where 
the best m-term approximation performs at worse) enjoy a very special struc- 
ture. 

Therefore, there is a reasonable hope, that the best m-term approximation could 
behave better, when considered in a certain average case. But first we point out two 
different interesting points of view on the subject. 
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1.2 Connection to compressed sensing 

The interest in £ p spaces (and especially in their finite-dimensional counterparts ££) 
with < p < 1 was recently stimulated by the impressive success of the novel and 
vastly growing area of compressed sensing as introduced in [51 [TU1 [HI [19] . Without 
going much into the details, we only note, that the techniques of compressed sensing 
allow to reconstruct a vector from an incomplete set of measurements utilizing the 
prior knowledge, that it is sparse, i.e. ||x||o is small. Furthermore, this approach 
may be applied [2] also to vectors, which are compressible, i.e. is small for 

(preferably small) < p < 1. Indeed, ([1]) tells us, that such a vector x may be very 
well approximated by sparse vectors. We point to [9| 124 4 125 1 142 j for the current state 
of the art of this field and for further references. 

This leads in a very natural way to a question, which stands in the background 
of this paper, namely: 

How does a typical vector of the £™ unit ball look like? 

or, posed in an exact way: 

Let ii be a probability measure on the unit ball of £™. What is the mean value of 
o~m{x) q with respect to this measure? 

Of course, the choice of \x plays a crucial role. There are several standard proba- 
bility measures, which are connected to the unit ball of £ p in a natural way, namely 
(cf. Definitions [2] and ED 

(i) the normalized Lebesgue measure, 

(ii) the n — 1 dimensional Hausdorff measure restricted to the surface of the unit 
ball of £p and correspondingly normalized, 

(iii) the so-called normalized cone measure. 

Unfortunately, it turns out, that all these three measures are "bad" - a typical 
vector with respect to any of them does not involve much structure and corresponds 
rather to noise then signal (in the sense described below). Therefore, we are looking 
for a new type of measures (cf. Definition I13p . which would behave better from this 
point of view. 

1.3 Random models of noise and signals 

Random vectors play an important role in the area of signal processing. For example, 
if n € N is a natural number, u) = (ux, . . . , oj n ) is a vector of independent Gaussian 
variables and e > is a real number, then eoj is a classical model of noise, namely 
the white noise. This model is used in the theory but also in the real life applications 
of signal processing. 

The random generation of a structured signal seems to be a more complicated 
task. Probably the most common random model to generate sparse vectors, cf. 
[3 fTBl [30l I40j . is the so-called Bernoulli-Gaussian model. Let again n G N be a 
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natural number and e > be a real number. Also u) = (ljx, . . . , w n ) stands for a 
vector of independent Gaussian variables. Furthermore, let < p < 1 be a real 
number and let g = (gi, ■ ■ ■ , Q n ) be a vector of independent Bernoulli variables 
denned as 




1, with probability p, 
0, with probability 1 — p. 



The components of the random Bernoulli- Gaussian vector then 
defined through 

Xi = EQi-uji, i = l,...,n. (5) 

Obviously, the average number of non-zero components of x is k := pn. Unfortu- 
nately, if k is much smaller than n, then the concentration of the number of non-zero 
components of x around k is not very strong. This becomes better, if k gets larger. 
But in that case, the model ([5]) resembles more and more the model of white noise. 
In some sense, ([5]) represents rather a randomly filtered white noise then a structured 
signal. It is one of the main aims of this paper to find a new measure, such that a 
random vector with respect to this measure would show a nearly sparse structure 
without the need of random filtering. 



1.4 Unit sphere 

Let us describe the situation in the most prominent case, when p = 2, m = and 
[i = fi2 is the normalized surface measure on the unit sphere § n_1 of Furthermore, 
we denote by j n the standard Gaussian measure on M. n with the density 

1 -11*112/2 



e 1^112/ " x G 



(2vr)«/2 ' 

We use polar coordinates to calculate 

/ max \x j\ d r y n (x) = - — —pr [ max • e~^ 2 / 2 dx 

J Rn j=l,...,n (2lT) n ^ Jmu 3=1, 



u 3 I 



a, 



oo 



max \rxj\e ^ rx ^ 2 ^ 2 d/j,2(x) dr 
r n e~ r l 2 dr ■ I max \xj\dfi2(x) (6) 

J§n-i j=l,-,n 

r n e~ r2/2 dr ■ \ o- (x) 00 dfi 2 (x), 



(2ir) n / 2 J J S n-i j=l,...,n 

~ (2vr)™/ 2 7 "" Jgn-l j=V--,n ' 

= (2vr)"/ 2 

where f2 n denotes the area of § n_1 . This formula connects the expected value of 
Co (^)oo with the expected value of maximum of n independent Gaussian variables. 
Using that this quantity is known to be equivalent to ylog(n + l), cf. [33l (3.14)], 

2W 2 



/ r n e- T2/2 dr = 2 {n - l ^ 2 T{{n + l)/2) and Q n 
Jo 



r(n/2) ' 



one obtains 



o G {x) 00 d i , 2 {x)^\ l ° g{jl + l \ nGN. (7) 

n 



Several comments on ([6]) and ([7]) are necessary. 
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(i) Quantities similar to the left-hand side of ([7]) have been used in the study of 
geometry of Banach spaces and local theory of Banach spaces since many years 
and are treated in detail in the work of Milman [23, 35, 36J. Especially, if || ■ \ \k 
is a norm in W 1 and K := {x £ R n : < 1} denotes the corresponding 

unit ball, then the quantity 



A K = \\x\\ K dii2(x) 

JS"- 1 



(and the closely connected median Mk of \\x\\k over § n_1 ) plays a crucial role 
in the Dvoretzky theorem [2Q |, I22 |, [35] and, in general, in the study of Euclidean 
sections of K, cf. [361 Section 5]. Furthermore, it is known that the case of 
K = [-1,1]™, when 

A K = max \xj\d^ 2 {x) = \ a (x)ood[J<2(x), 

Js«-i i=l,...,n J§n-i 

is extremal, cf. [35j . 

(ii) The connection between the estimated value of a maximum of independent 
Gaussian variables and the estimated value of the largest coordinate of a ran- 
dom vector on § n_1 is given just by integration in polar coordinates and is 
one of the standard techniques in the local theory of Banach spaces. Due to 
the result of |43j . this holds true also for other values of p, even for p < 1, 
with Gaussian variables replaced by variables with the density c p e~'*' P . This 
approach is nowadays classical in the study of the geometry and concentration 
of measure phenomenon on the £™-balls, cf. [3 O H EH [38l E] • 

(iii) For every x E S n_1 we obtain easily that max \xj\ > ( — J = \j\fn. 



, l,....n \n 

3=1 

Estimate ([7|) shows that the average value of max \xA over § n_1 is asymp- 

j=l,...,n 

totically larger only by a logarithmic factor. The detailed study of the concen- 
tration of max \xj\ around its estimated value (or its mean value) is known 

j=l,...,n 

as concentration of measure phenomena \32\ [33| [36] and gives more accurate 
information then the one included in (|7|). As our main interest lies in esti- 
mates of average best m-term widths, cf. Definition [H we do not investigate 
the concentration properties in this paper and leave this subject to further 
research. 

(iv) The calculation (J6j) is based on the use of polar coordinates. For p / 2, the 
normalized cone measure is exactly that measure, for which a similar formula 
holds, cf. (|13p . The estimates for n — 1 dimensional surface measure are later 
obtained using its density with respect to the cone measure, cf. Lemma [TUJ 

(v) As we want to keep the paper self-contained as much as possible and to make 
it readable also for readers without (almost) any stochastic background, we 
prefer to use simple and direct techniques. For example we use rather the 
simple estimates in Lemma EJ than any of their sophisticated improvements 
available in literature. 
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(vi) The connection to random Gaussian variables explains, why a random point 
of S n_1 is sometimes referred to as white (or Gaussian) noise. It is usually not 
associated with any reasonable (i.e. structured) signal, rather it represents a 
good model for random noise. 

1.5 Basic Definitions and Main Results 
1.5.1 Definition of average best m-term widths 

After describing the context of our work we shall now present the definition of the 
so-called average best m-term widths, which are the main subject of our study. 
First, we observe, that 

0~m ((-El > • • • j •Enjjq — 0~m((£l%l j • • • ; £n^n))g — 0"m((|^l !>•••) |*^n I ))g 

holds for every x G R™ and e 6 { — l,+l} n . Also all the measures, which we shall 
consider, are invariant under any of the mappings 

(xi, . . . , x n ) -> (eixi, . . . , e n x n ), e £ {-1, +l} n 

and therefore we restrict our attention only to R" in the following definition. 

Definition 1. Let < p < q < oo and let n > 2 and 0<m<n-lbe natural 
numbers. 



(T) We set 



A „ = j {(*!,..., tn)€K£:£? =1 i? = l}, P<^ 



v 



{(ti, . . . ,t n ) £ R™ : maxj =lj ... jn tj = 1}, p = oo. 
(ii) Let \i be a Borel probability measure on A™. Then 

°m'(M) = / o- m (x) q d(j,(x) 



A'; 



is called average surface best m-term width of id : 1™ — > with respect to fj,. 
(iii) Let v be a Borel probability measure on [0, 1] • A™. Then 



a m q ( u ) = / a m (x) q du(x) 
•/[0,1]-A» 

is called average volume best m-term width of id: £™ — > ^™ mtt respect to v. 
Let us observe, that the estimates 

^«(m)<^' ^d ^»<<r™ 

follow trivially by Definition[TJ Furthermore, the mapping x —> o~ m (x) q is continuous 
and, therefore, measurable with respect to the Borel measure \x. 
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1.5.2 Main results 



M^)= urn n . AnV ^ cA p- 



After introducing new notion of average best m-term width in Definition [TJ we 
study its behavior for the measures on A™, which are widely used in literature. A 
prominent role among them is played by the so-called normalized cone measure given 
by 

A([0,l]--4) 
A([0,1]-A™)' 

In Theorem [7] and Proposition [8] we provide basic estimates of <Tm 9 (A i p) for q = oo 
and q < oo, respectively. Surprisingly enough, it turns out that © has its direct 
counterpart for all < p < oo. This means (as described above), that the coordinates 
of a "typical" element of the surface of the 01 unit ball are well concentrated around 
the value n _1//p . So, roughly speaking, it is only ^-normalized noise. 

Another well known probability measure on A™ is the normalized surface measure 
q p , cf. Definition [9l We calculate in Lemma [TU1 the density of g p with respect to fj, p 
to be equal to 

/ n v 1/2 

?P f„\ _ „-l I X 2 P~ " 



where 



A" 



71 2 -2\ 1/2 
Yj X ? 2 ) d Vp(. X ) 
i=l ' 



is the normalizing constant. This result (which is a generalization of the work of 
Naor and Romik [38J to the non-convex case < p < 1) might be of independent 
interest for the study of the geometry of spheres. One observes immediately, 
that if p < 1 and one or more coordinates of %i are going to zero, then this density 
has a polynomial singularity and, therefore, gives more weight to areas closed to 
coordinate hyperplanes. 

We then obtain in Theorem 1121 an estimate of 0~q' oo (qp) from above. Although 
the measure g p concentrates around coordinate hyperplanes, it turns out, that the 
estimate from above of o"q ,00 (^ p ) as obtained in Theorem [7] and the estimate of 
Theorem 1121 differ only in the constants involved. 

The last part of this paper is devoted to the search of a new probability measure 
on A™, which would "promote sparsity" in the sense, that the mean value of o~ m (x) q 
decays rapidly with m. One possible candidate is presented in Definition [T3] by 
introducing a new class of measures Pt p, which are given by their density with 
respect to the cone measure \i v 



d/j, p 



where c Pj/ g is a normalising constant. We refer also to Remark [3] for an equivalent 
characterisation. 

We show, that for an appropriate choice of /3, namely (3 = p/n — 1, the estimated 
value of the m-th largest coefficient of elements of the £p-unit sphere decays expo- 
nentially with m. Namely, Theorem [T6l provides estimates of which 
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at the end imply that 

C 1 C 2 
- — £™ < lim inf o%£ 1 (0 P)P / n _i) < lim sup (6 / n ^) < -j- — (8) 

i + l) ^ n ^°° (i + l) 

for two positive real numbers and C 2 , which depend only on p. 

This result (which is also simulated numerically in the very last section of this 
paper) is in a certain way independent of n. This gives a hope, that one could 
apply this approach also to the infinite-dimensional spaces l v or, using a suitable 
discretization technique (like wavelet decomposition), also to some function spaces. 
This remains a subject of our further research. 

Of course, the class P ^ provides only one example of measures with rapid decay 
of their average best m-term widths. We leave also the detailed study of other 
measures with such properties open to future work. 

Note added in the proof: Let us comment on the relation of our work with 
recent papers of Cevher [12J and Gribonval, Cevher, and Davis [29J. Cevher uses 
in [12] the concept of Order Statistics [16] to identify the probability distributions, 
whose independent and identically distributed (i.i.d.) realizations result typically in 
p-compressible signals, i.e. 

x*<CR-i- 1/p . 

Our approach here is a bit different and more connected to the geometry of spaces. 
In accordance with [33], this leads to the study of ££- normalized vectors with i.i.d. 
components. This again allows us to better distinguish between the norm of such a 
vector (i.e. its size or energy) and its direction (i.e. its structure). 

The approach of the recent preprint [29] (which was submitted during the review 
process of this work) comes much closer to ours. Their Definition 1 of "Compressible 
priors" introduces the quantity called relative best m-term approximation error as 

W^Wq 

The asymptotic behavior of this quantity for x = (xi, • • • ,x n ) being a vector with 
i.i.d. components and. lim inf n-^oc ^rr 1 

> k G (0, 1) is then used to define g-compressible 
probability distribution functions. In contrary to [29], we consider i q approximation 
of £ p normalized vectors and therefore our widths depend on two integrability pa- 
rameters p and q. Furthermore, we do not pose any restrictions on the ratio m/n 
to any specific regime and consider the average best m-term widths g™{ijl) for all 
< m < n — 1. In the only case, when we speak about asymptotics (i.e. (j37[) of 
Theorem 1 16|) . we suppose m to be constant and n growing to infinity. Furthermore, 
Theorem 1 of [29] shows that all distributions with bounded fourth moment do not 
fit into their scheme and do not "promote sparsity". As we are interested in distri- 
butions, which are connected to the geometry of ^"-balls (i.e. generalized Gaussian 
distribution and generalized Gamma distribution), it is exactly that reason why 
we change the parameters of the distribution 9 p ^ in dependence of n. Although 
quite inconvenient from the mathematical point of view, it is not really clear if this 
presents a serious obstacle for application of our approach. But the investigation of 
this goes beyond the scope of this work. 
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1.5.3 Structure of the paper 

The paper is structured as follows. The rest of Section 1 gives some notation used 
throughout the paper. Sections 2 and 3 provide estimates of this quantity with 
respect to the cone and surface measure, respectively. In Section 4, we study a 
new type of measures on the unit ball of We show, that the typical element 
with respect to those measures behaves in a completely different way compared 
to the situations discussed before. Those results are illustrated by the numerical 
experiments described in Section 5. 

1.6 Notation 

We denote by R the set of real numbers, by R+ := [0, oo) the set of nonnegative 
real numbers and by R n and R™ their n-fold tensor products. The components of 
x 6 R n are denoted by x±, . . . ,x n . The symbol A stands for the Lebesgue measure 
on R n and % for the n — 1 dimensional Hausdorff measure in R n . If A C R n and 
/ C R is an interval, we write I ■ A := {tx : t £ I,x £ ^4}. 
We shall use very often the Gamma function, defined by 

T(s) := / t s - l e~ l dt, s > 0. (9) 
J o 

In one case, we shall use also the Beta function 

B(p, q) := T tP-\l - tf- l dt = T Jf )T } q l P,q>0 (10) 
Jo F(p + q) 

and the digamma function 

• (.):= | log rw = ^, »>o- 

We recommend [TJ Chapter 6] as a standard reference for both basic and more 
advanced properties of these functions. We shall need the Stirling's approximation 
formula (which was implicitly used already in ([7])) in its most simple form 

r M=i/r(irH(;))- i>a (n) 

If a = {aj}^2_ 1 and b = {bj}'jl 1 are real sequences, then aj < bj means, that 
there is an absolute constant C > 0, such that aj < C bj for all j = 1,2, ... . Similar 
convention is used for aj > bj and aj ~ bj. The capital letter C with indices (i.e. 
C p ) denotes a positive real number depending only on the highlighted parameters 
and their meaning can change from one occurrence to another. If, for any reason, 
we shall need to distinguish between several numbers of this type, we shall write for 
example and Cp as already done in (jSJ). 
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2 Normalized cone measure 



In this section, we study the average best m-term widths as introduced in Definition 
[I] for the most important measure (the so-called cone measure) on A™, which is 
well studied in the literature within the geometry of i p spaces, cf. [38l HI EZl IS]- 
Essentially, we recover in Theorem [7J an analogue of the estimate ([7]) for all < p < 
oo. 

Definition 2. Let < p < oo and n > 2. Then 

X([0,1]-A) 
^ } ~ A([0,1]-A«)' ^ CAp 

is the normalized cone measure on A™. 

If v p denotes the p-normalized Lebesgue measure, i.e. 



A(A) 
A([0,1]-A»)' 



then the connection between v p and // p is given by 

u p (A) = n r r n ~ Vp f {X£A:|Nlp = rj ).//■■ ( 12 ) 



o 



The proof of (fT2j) follows directly for sets of the type [a, b] -A with < a < b < oo and 
A C A™ and is then finished by standard approximation arguments. The formula 
(|12p may be generalized to the so-called polar decomposition identity, cf. [JJ, 



/(x)dA(x) 

-n-l 



n / r / f(rx)djjL p {x)dr, (13) 



a([o,i]-a«) y y A: 

which holds for every / G Li(M"). 

The formula (|13p allows to transfer immediately the results for the average sur- 
face best m-term approximation with respect to fi p to the average volume approxi- 
mation with respect to v p . 

Proposition 3. The identity 

n 



n + l 

holds for all < p < q < oo, all n > 2 and all < m < n— 1. 
Proof. We plug the function 

f(x) = CT m (x) q ■ X[0,l]-A$(x) 
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into (I13p and obtain 

a m (x) q d\(x) 

[0,1] -A" 



o- m (a;) 9 di/p(a;) 



A([0,1]-A») y [0il] . A n 

= n/ r n_1 / a m (rx) q d[ip(x)dr = n I r n dr ■ a^ q {n p ), 



o ./A* 



which gives the result. □ 

Proposition[3]shows, that the ratio between approximation with respect to fj, p and 
Up is equal to 1 + 1/n. This justifies our interest in measures on A™. Furthermore, 
it shows that the quantities o-m 9 (v p ) and Om{^p) behave asymptotically (i.e. for 
n — > oo) very similarly. 

Let p = 2 and let wi,...,u; n be independent normally distributed Gaussian 
random variables. Then 

eM) = , 2{A) = P fe-''^ eA icA^. 
V(i^=i^J / 

As noted in [33], this relation may be generalized to all values of p with < p < oo. 
Let cji, . . . ,u) n be independent random variables on M + each with density 

c p e~ tP , t > 

with respect to the Lebesgue measure, where c p = r(f/ p ) = r(i/p+i) • 
Then, cf. |43|, Lemma 1], 



(^1, ■ ■ ■ ,Ur . 

We shall fix oj\, . . . , u) n to the end of this paper. Also the symbols E and P are always 
taken with respect to these variables. 



2.1 The case q = oo 

In this section we deal with uniform approximation, i.e. with the case q = oo. To 
be able to imitate the calculation ([6]), we shall need several tools, which are subject 
of Lemmas 01 [5] and [6j Our main result of this section (Theorem [7|) then provides 
the estimate of Orn°{^p) from above for all m with < m < n — 1. Furthermore, it 
is shown that in the range < m < e p n this estimate is also optimal. 

Lemma 4. Let < p < oo and let n > 2 and 1 < m < n be natural numbers. Then 

f x * du ( x) - r ( re ^) Ex* 
J A n X ™ ati pW ~ T {n/p + 1/p) m - 

Furthermore, there are two positive real numbers Cp and Cp depending only on p, 
such that 
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Proof. We put f(x) = x* m e x ^ " Xn and use the polar decomposition identity (fT3j) 
A([0,1]-A«) =n 7 ' J A , 

POO 

= n r n ~ 1 ■ re~ rt dr I x* m d^ p {x) 



r™- 1 / (rx* m ) ■ e- {rXl)P -- {rXn)P d^ p {x)dr 



or, equivalently, 



A" 

p 



P „.p 



x^e x i "' Xn d\(x) 
x* m dv P (x) = ^.^^joo^-r^- ( 15 ) 



The identity 



x r « e -* ir= mt±m 



lo V 
follows by a simple substitution. Furthermore, we shall need the classical formula 
of Dirichlet for the volume of the unit ball B{n of £ p , cf. [211 p. 157], 

X(B e n) rfl/» + l) n 
A [0, 11 • A") = — = \ IP \ . 
u ' J p> 2 n r(n/p + l) 

This allows us to reformulate ([15p as 

r(n/p+l)Ex; T(n/p)Ex* m 



x m dfip(x) 



A n m ^ y > c n . n / p .r(n/p + l/p)T(l/p + l) n T(n/p+l/p)' 



Finally, we use Stirling's formula ([lip to estimate 



T(n/p + 1/p) " ° p (n/p + l/ p )n/p+l/p-l/2 - C P + i ) ~ C p 
and similarly for the estimate from below. □ 
Lemma 5. Let a£i ami 5 > 0. T/ien 



oo 



1, z/ a<0, 

jz^js, if a > and f < 1, 

t /(5 
■S/a '■ 



(jT-T^k, if «>0 and f > 1. 



Proof. If a < 0, we may estimate 

u a e- u du <5 a J e~ u du = 5 a e~ 5 . 

If < a < 1, we use partial integration and obtain 

/ u a e~ u du = 5 a e~ s + a u a - x e~ u du < 5 a e~ s (l + ckT 1 ). 
J<5 J<5 
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This is smaller than 



(X a 2 i I 



5 a e- d {l + - + -, + ... ) = 5 a e 



6 6 2 J 1- a/5 

if a/6 < 1 and smaller than 

^?(l+*+4 + ...) = ^e-« a 1 



5 a a 2 5 1 — 6 /a 

if a/5 > 1. 

If — 1 < a < k for some £ N, we iterate the partial integration and arrive at 

/■oo 

y u a e""dn < 5 a e- 5 (l + a-T 1 + a(a - 1)<T 2 + • • • + a(a - 1) . . . (a - k + l)<T fc ) 



z a a 2 a k 

< J ^ (1 + | + _ + ... + _ ) 

r f i ^r , if a/5 < 1, 

□ 

Lemma 6. Let < p < oo. T/ien i/iere is a positive real number C p , such that 

for all 1 < m < n. 
Proof. We estimate 

Ex* m = F(uj* m > t)dt = 5+1 P(u4 > t)eft 
Jo Js 

~ 6+ (jn)Is ¥ ( UJl>t ' uj2>t >---> UJm>t } dt ( 16 ) 

- ,+ (m)jT p( " I>t) "*- 

The parameter 5 > max(l,3(l/p — l)) 1 ^ is to be chosen later on. We substitute 
v = u p and obtain 

f°° r f°° 

P(wi >t) = c p e- uP du = ^ / ^-V^. 

Using the first two estimates of Lemma [5] (recall that t p > 5 P > max(l,3(l/p— 1))), 
we arrive at 

P(wi > t) < Cpt 1 ^-*', 
where C p depends only on p. We plug this estimate into ([To]) and obtain 

E x* m < 5 + (™) (C p ) m J™ t m(1 - p) e- mtP dt. (17) 
13 



If p > 1, then 

i 

t m(l-p) e -mtP dt < § m(l-p) 

Altogether, we obtain 



- mtP dt < 5 m{1 - p) 



" a u l / p - l du< e- m5P . 



Ex* m <S+[ - )(C p ) m e 



n 



m„—mS p 



Using (£) < (f^) m and choosing 5 = Cpln^) 1 ^ finishes the proof. 
If p < 1, we use again the second estimate of Lemma [5] 



t m(l-p) e -mt*> dt 



< 



1 

mp 
1 



m 



(l/p-l)(m+l) 



'mSP 

s (l-p)(m+i) e -m&> . . 



f (l/ P -l)( m +l) e -« du 



_ < r i x(l-p)(m+l) -rnSP 

2(l/p-l) - P 



5P 



mp 1 

Using (HID and (™) < (f ) m again, we get 

EsJ < (5 + exp(— m5 p + mln(en/m) + (1 — p)(m + 1) In 5 + m In C p + hiC p 
< 5 + exp[-m(5 p + C p \n{en/m) + 2(1 - p) In 5)] 

The choice 5 = C p ln(|^) 1//p with C' p large enough ensures, that 

S p 5 P 

— > C p ln(en/m) and — > 2(1 — p)ln5 

and finishes the proof. 

The following theorem gives the basic estimates of Cm 00 (/%>)■ 

Theorem 7. Let < p < oo and Zei n > 2. 

(%) Lei < m < n— 1. Then 



□ 



log 



m+1 



1/p 



(18) 



(wj There is a number < e p < 1, such that for < m < e p n the following estimate 
holds 

"i°g(^r)l 1/p 



(19) 



Proof. Lemma 0] and Lemma [6] imply immediately the first part of the theorem if 
p < oo. If p = oo, the proof is trivial. 

The proof of the second part is divided into two steps. 

Step 1. We start first with the case m = 0. 

If p = oo, then x\ = 1 for all x G A p and the proof is trivial. Let us therefore 
assume, that p < oo. According to Lemma HI we have to estimate EiJ from below. 
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This was done in [43\ Lemma 2]. We include a slightly different proof for readers 
convenience. For every to > 0, it holds 

Ex* > t F(xl > t ) = t P( max Xj > t ) > *o[nP(xi > t ) - ( " )P(xi > ^o) 2 ]- 

1<3<" \2J 

We define i by F(x\ > t ) = ^ and obtain Ei] > t /2. 
From the simple estimate 

^ f°° u x lv- x e- u du > C p e~ 2TP , T > 1, 

it follows, that there is a positive real number j p > 0, such that 

P(xi > 7 P (log(en)) 1/p ) > l/n. 

This gives t > 7 P (log(en)) 1 /P and Ex\ > C p (iog(eri)) x lv . 

Step 2. Let < m < e p n, where e p > will be chosen later on. 
We shall use the inequality 

- Y logVf log 1 /? f- V 1 < m < n , (20) 

3=1 J 

which follows by direct calculation for p = 1, by Holder's inequality for 1 < p < oo 
and by replacing the sum by the corresponding integral and integration by parts if 
< p < 1. 
We denote 

^ m 

W X W(m) = ~'^2 X j- 
3=1 

By Lemma E] and (|2D)> . 

EH (m) = if;E^<^f;iog 1 /p(^) <C>g^(^). (21) 

3=1 3=1 



To estimate E||a;||( m ) from below, we assume that 1 < m < n and that n/m is 
an integer (otherwise one has to slightly modify the argument at the cost of the 
constants involved). We partition the set {1, . . . , n} = A\ U • • • U A m , where each 
one of the disjoint sets Aj has n/m elements. Then we have 



. m 
3=1 



and by the first step we obtain 

E||x|| fml > - VEmaM, >C;Mog 1/p (— V (22) 
v j m l( - A . v \m / 

3=1 
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Let N p < l/e p be a natural number to be chosen later on. Combining (|2ip with 
(1221) gives finally 



Ex! 



1 



> — — V Ext >E||x| 



(N p m) 



(m) 



^^log 1 /^ 



en 
\N p m 



AT, 



v m 



log 



i/p/en 



< 



Cl 



log(AT, 



log 



An appropriate choice of AT p and e p (i.e. N p > 2 1 / p C p /C p and e p < min(l/A r p , e/Ni 2 )) 
with 

l/p 

log(ATp) ' 



1 



log 



A^„ 



gives the result. □ 

Remark 1. (i) Theorem [7] provides basic estimates of average best m-term widths 
0m°°(/^p)- In the case m = a stronger result on concentration of /U p was 
obtained already in [43, Theorem 3 and Remark 2]. It would be certainly of 
interest to obtain a similar statement also for other values of m > 0, but this 
would go beyond the scope of this paper and we leave this direction open for 
further study. 

(ii) Theorem [7] may be interpreted in the sense of the discussion after formula 
(UJ). Namely, the average coordinate of x £ A p is n~ 1 l v . Theorem [JJ shows, 
that the average value of the largest coordinate is only slightly larger (namely 
c[ln(en)] 1//p times larger). In this sense, the average point of A p is only slightly 
modified (and properly normalized) white noise. 



m 



Using the interpolation formula one may immediately extend this result 
to all < p < q < oo. But we shall see later on, that in the case q < oo, one 
may prove slightly better estimates. 



(iv) The behavior of crm°°(/ip) was studied in detail in [281 Example 10] for p = 2. 
It was shown that if Xi are independent A r (0, 1) Gaussian random variables 
and m < n/2 + 1, then 



2n 

In — < Ex! 

m 



: C\ \n 



2n 



m 



where c and C are absolute positive constants. Furthermore, if m > n/2 + 1, 
then 



7r n 
2 



■ ■ ■ I 



n + 1 



< Ex* < V2^ 



n 



m + 1 



(v) The method used in the proof of the second part of Theorem [JJ may be found 
for example in |27| . 
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2.2 The case q < oo 

We discuss briefly also the case when q < oo. It turns out, that in this case the 
logarithmic term disappears. We do not go much into details and restrict ourselves 
to the case m = 0. 

Proposition 8. Let n > 2 and < p < q < oo. Then 
(i) Cp gn 1 ^ < E \\x\\ q < Cp^n 1 /*, 
(ii) 

-y E ||x|| g pq \ _ f ii ii j ( \ ^ n" 1 ^ ll x llg 
°p,g ' n i/ P - a o IMpJ - / \\x\\ q anp{x) s ^ p , q ■ ^ 1/p 

p 

and 

(in) C^n 1 /^ < a£'V P ) < Ciy/i-V?, 

where in all these estimates C p and C p are positive real numbers depending only on 
p. 

Proof, (i) The following two inequalities may be easily proved by Holder's and 
Minkowski inequality. 



n v l/q n n 

j= i / j= i j= i 

n n / n \ 

.7=1 3=1 S=l J 



xl) 1/q , ?>1, 



1/9 

ff < 1. 



This gives for q > 1 

E||x||, < n 1/9 (Exp 1/<? and E||x|| g > n^Exj 

and for g < 1 

E||x||, < n 1/q E Xj and E||z||, > n 1/<? (Exp 1/9 . 

Let us note, that the value of Exj and (Exj) 1 / 9 does not depend on n, only on p 
and q. 

(ii) The proof of the second part resembles very much the proof of Lemma H] and 
is left to the reader. 

(iii) The last point follows immediately from (i) and (ii). □ 

Remark 2. A similar statement to Proposition [5] is included in \i'6\ Lemma 2, point 
4]- 
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3 Normalized surface measure 



In this section we study the average best m-term widths for another classical measure 
on A™, namely the normalized Hausdorff measure, cf. Definition [9j Intuitively, this 
measure gives more weight to those areas, where one or more components of x E A™ 
are close to zero. It turns out, that this is really the case - with the mathematical 
formulation given in Lemma [10] below. This relation is then used together with 
Lemma [TT1 in Theorem 1121 to provide estimates of <Tq ,00 (^ p ) from above. 

Definition 9. Let n > 2 be a natural number. We denote by 

* w >-^- AcA > 

the normalized n — 1 dimensional Hausdorff measure on A™. 

Let us mention, that for p E {l,2,oo} the measure g p coincides with fi p . The 
following lemma provides a relationship between the normalized surface measure g p 
and the cone measure \i p . For p > 1, it was given by [SS]. We follow closely their 
approach and it turns out, that it may be generalized also to the non-convex case 
of < p < 1. 

Lemma 10. Let < p < oo and n > 2. Then g p is an absolutely continuous measure 
with respect to fj, p and for fi p almost every x E A" it holds 



dg p nA([0,l]-Ag) 



2p-2 



1=1 



1/2 



where 



A" \ ■ -i 



" 2-2\ 1/2 



is the normalizing constant. 



Proof. The proof imitates the proof of [38} Lemma 1 and Lemma 2], where the 
statement was proven for 1 < p < oo. Hence, we may assume, that < p < 1. First, 
we introduce some notation. 

We fix x = (x±, . . . , x n ) E A™, such that 

• the mapping y — > \\y\\ p is differentiable at x, 

• x is a density point of H, i.e. 

^^77 ^ = 1, (23) 



n-1 



where V^_i denotes the Lebesgue volume of the n — 1 dimensional Euclidean 
unit ball. 

Xi > for alii = 1, . . . , n. 
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Obviously, £ p -almost every x G A™ satisfies all the three properties (we refer for 
example to |344 Theorem 16.2] for the second one). 

Furthermore, we put z := V(|| • || p )(x). This means, that 



where 



9(6) := sup 



x + y\\ p = 1 + (z,y) + r(y) 

r (y)\ 



(24) 



2/2 



: < \\y\\ 2 < 6} , <5>0 



tends to zero if 6 tends to zero. Using ([24"|) for y = 5x, one observes, that (z, x) = 1. 
We denote by H = x + z 1 - the tangent hyperplane to A™ at x. Let us note, that 
for < p < 1 the set R+ \ [0, 1) • A™ = [1, oo) • A™ is convex. Next, we show, that 
( z ->y) > 1 for every y G [1, oo) • A" Indeed, 

1 < \\x + A(y - x)|| p = 1 + {z, X(y - x)) + r(X(y - x)) 
= 1 — A + \(z, y) + r(X(y — x)) 

Dividing by A > and letting A — > gives the statement. 

The proof of the lemma is based on the following two inclusions, namely 



[0,1] • (B(x,e(l-6(e)))nH) C [0,1] • B(i,e)nA? 



and 



[0,1]- ( B(x,e) HA; C [0,1 + ee(e)\ ■ [B(x,e(l + 0(e)\\x\\ 2 )) n H) , 



(25) 



(26) 



which hold for all e > small enough. 

First, we prove ([25]) . To given < s < 1 and v G B(x,e(l — 0(e)) n H we need 
to find < i < 1 and w G B(x, e) n A^ , such that = iio. To do this, we set 



w :- 



G Ap and t := s||u|| p . 



We need to show, that t < 1 and ||x — || 2 < s. 
We choose < e < minj Xj. Then 



Xi < \xi - Vi\ +Vi< llx - v\\o + Vi < E + Vi 



for every i = 1, ... ,n, which implies, that > and u G M™ . From v £ H and 

w G 

Next, we write 



we deduce, that ||u|| p < 1. Hence t = s\\v\\ p < \\v\\ p < 1. 



\x — w h 





V 


x — 






\\ v \\p 



< x — u 



1 



^ < e(l - 0(e)) + 1 



< e(l -0(e)) + ||v|| 2 

\\ v \\p 

= e(l - 0(e)) + 1 - {1 + - x, z) + r(v - x)} 
= e(l -0(e)) +r(v-x) < e. 
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Next, we prove ([26]) , We need to find to given < t < 1 and w € B(x, e) n A™ 
some < s < 1 + eO(e) and v G -B(x, e(l + #(£)||x||2)) H such that tw = si;. We 
put 

. . , to 

s:=t(w,z) and v := -. -. 

{w,z} 

Let us recall, that we have shown above, that w G A" implies that (to, z) > 1. 

Of course, tw = sv and v £ H (as (v,z) = 1). Hence, it remains to show, that 
s < 1 + e6(e) and ||-o - x|| 2 < e(l + 0(£)||x|| 2 ). 

The application of ([21]) gives 



\w\ 



|x + (to — x)\\ p = 1 + (w — x, z) + r(to — x), 



which again forces (w, z) < 1 + eO(e). Then s = t(u>, z) < (u>, 2) < 1 + £#(£) 
Finally, we write 



\v - x h 



•m; 



(w,z) 



x 



< 



io 



, to - a; 2 

< : : h X 2 



(tO, 2!) 
(t0,2) - 1 



+ 



(10,2) 



<£ + £&(£) X 2 . 



(w,z) (w,z) 
Equipped with (j25[) and (j26f) . we may finish the proof of the lemma. We write 

A([0,1]-A£) 



g p (B(x,e)nAZ) H(B(x,e)nA;) e «-V n _i 

hm — ; — ; = hm 



)0(i p (B(x,£)nA;) e->o H(A£ 
_A([0,1]-AJ) 



ft (A! 



lim 



£"-%_! A([0,1] • [S(x,£)nA«]) 



-n— It/ 

Vn-1 



^oA([0,l].[%e)nA;])' 



(27) 



where we have used (|23p . As the perpendicular distance between zero and H is equal 
to l/||z|| 2 , we observe, that 



vol(B(x,a) C\H) 



a V n -i 



n\\z\\2 

holds for every a > 0. Using this, we get from ([25]) and ([26]) 

[£(i - #(£))r v n . 



A([0,1] • (B(x,£(l-0(£)))n# 

<A([o,i].fc £ )nA; 



n\\z\\i 



< A ( [0, 1 + e9{e)\ ■ (B(x,e(l + 9(e)\\x\\ 2 )) D H 

4-1 



[l + ee(e)]' 



[e(l + e(e)\\x\\ 2 )] n - 1 Vn 
n \\ z h 



Combining these estimates with ()27p gives the result. 



□ 



Following lemma is analogous to Lemma0]and reduces the calculation of CTq (g p ) 
to inequalities for the estimated values of functions of the random variables xi, . . . , x n . 
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Lemma 11. Let < p < oo. There exists two positive real numbers Ct\ and C%, 



such that 



i=i 



2p-2 

i 



1/2 



p n 



2p-2 



1/2 



. n -l/p <o-%>°°( Qp )= / Xjdfii 



(28) 



8=1 



i=l < ^2 i=l 

L (E> 2 '~ 2 



1/2 



p n 



n 



-i/p 



E (E 



for all n > 2. 



Proof. Only the inequalities need a proof. It resembles the proof of Lemma 2] and 
is again based on the polar decomposition formula (|13|) , 
We plug the functions 



n 

h{x) = xl(^2xf 



,2,-2 

~i=l 

into (fT3|) and obtain 

p,oo 



1/2 p p 
e ~*? 



» and f 2 (x)=(y2x 2 i 



2p-2 



V 2 ,.p_ _„P 



1=1 



/i(x)ds- / r n+p - 2 e- rP dr 



/* /'OO 

Jr" Jo 



Exi(^r 2 

i=l 



1/2 



e(£> 

i=l 



2p-2 



1/2 



T(n/p + 1 — 1/p) 
T(n/p + 1) 



By Stirling's formula, the last expression is equivalent to n l l p with constants of 
equivalence depending only on p. □ 

Theorem 12. Let < p < oo. T/ien t/iere is a positive real number C p , such that 



log(n + 1) 



n 



i/p 



(29) 



for all n>2. 

Proof. We define a probability measure a p>n on by the density 



1/2 



2p-2 



c j "--p,n •" 



vi=l 



. E 

+ \i=i 



1/2 



2p-2 



P P 

e -x 1 x ndx 
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with respect to the Lebesgue measure. Let us note, that due to the inequality 



\ 1/2 
2p-2 \ 



E 



X : 



i=l 



the integral in the definition of c Pjn really converges and a p ^ n is well defined. 
According to Lemma [TTT we need to estimate 

/ x\da Pin {x). 
We calculate for 8 > 1, which is to be chosen later on, 

r roo poo 

/ x*da p>n (x) = / ap,n(£i > t)dt < 5 + / ap,n(xi > 
Vr™ ' io Js ' 

< 5 + n a p ^ n (xi > t)dt. 



We write x' = (x 2 , . . . , x n ) 6 R™ . Then 



C^p, 71(^1 ^ — *-p?,n 



< C 



P 



P 

e~ Xl 



, E 



1/2 



2p-2 



+ \i=l 



«T'+ E 



e "' Xn dx dxi 
\ 1/2" 



2p-2 



Z" 00 p 



K i=2 



P P , 

e ~ x 2 







' ' " " (/./''(/./' I 



+ c, 



:=/l+/ 2 - 



„- IE 

+ \i=2 



1/2 



2p-2 



The inequality 



CpCp,n — Cp 



\i=i 



1/2 



,2p-2 



P P 

e -x 1 x ndx 



/•oo r / n \ 



(30) 



e 2 " 



shows, that 



1 = n~ — ~ 

£p Cp : n Cr)Cr>A 



dx-i 



CpCp,l 



p 
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Using ([30]) again, we get also 



h = ■ c P)Tl -i / e °%dxi <c p e x? dxi = — • / s 1/p l e s ds. 
Jt Jt V Jtp 

If p > 1 , we get 

h + h < Cpe~ tP , t>l (31) 

and 

/ Xida Pjn (x) < <5 + C p n / e~ tP dt < 5 + C' p ne~ SP . 

By choosing 5 = C p log(n + l) 1//p , we get the result. 

If p < 1, we use the second estimate of Lemma [5] and replace (j3T|) with 

+ < C p £-*e-*, t>to 
for to > 1 large enough and the result again follows by the choice of 5. 

□ 

Remark 3. (i) Theorem 1121 shows, that the average size of the largest coordinate 
of x £ Ap taken with respect to the normalized Hausdorff measure is again 
only slightly larger than n~ l / p . Hence, also in this case, the typical element of 
A™ seems to be far from being sparse and resembles rather properly normalized 
white noise in the sense described in Introduction. 

(ii) Using interpolation inequality (JH), one may again obtain a similar estimate 
also for < p < q < 00, namely 



l/p-l/q 

^\Qp) < C p , q 



log(n + 1) 



n 



It would be probably possible to avoid the logarithmic terms and provide 
improved estimates also for m > 0, but we shall not go into this direction. Our 
main aim of this section was to show, that normalized Hausdorff measure does 
not prefer sparse (or nearly sparse) vectors, and this was clearly demonstrated 
by Theorem 1121 



4 Tensor product measures 

As discussed already in the Introduction and proved in Theorem [7] and Theorem 
[P2"l the average vectors of A^ 1 with respect to the cone measure [a p and with respect 
to surface measure q p behave "badly" meaning that (roughly speaking) many of 
their coordinates are approximately of the same size. As promised before, we shall 
now introduce a new class of measures, for which the random vector behaves in 
a completely different way. These measures are defined through their density with 
respect to the cone measure fj, p . This density has a strong singularity near the points 
with vanishing coordinates. 
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Definition 13. Let < p < oo, /3 > — 1 and n > 2. Then we define the probability 
measure P)J g on A" by 



^(x) = c--n^ x€A», (32) 



where 

« n 

°p>p = / n^f^p(^)- (33) 



Remark 4. (i) If > /3 > — 1, then ([32]) defines the density of P)J g with respect 
to \i v only for points, where Xj ^ for all « = 1, . . . , n. That means, that this 
density is defined // p -almost everywhere. The definition is then complemented 
by the statement, that 9 p p is absolutely continuous with respect to \x p . 

(ii) We shall see later on, that the condition (3 > —1 ensures, that (j33|) is finite. 

(iii) It was observed already in [3], that the measures 9 P) p allow a formula similar 

to JHJ). We plug the function f(x) = X[o,oo)-ATYi=i x i e ~^ x ^ (H3j) , where 
A is any // p -measurable subset of A™ , and obtain 

r _ poo (• n 

/ ff xfe- ||x|l ^A(x) = A([0,l]-A")-n- / r n ~ 1+nfi e~ rV dr- / TT xf d/z p (x). 
i[0,oo)-^^ 7o J^i=\ 

We use a similar formula also for A = A" which leads to 



„ n r. n 

/ TTxfd/i p (x) / TTxfe-^ll^x 



P,/3 



,. n „ n 

4 / n^to / n^ e " l|x|l? ^ 



Let w' = (u^, . . . , w^) be a vector with independent identically distributed com- 
ponents with respect to the density c Pt pt^e~ tP , t > 0, where = / °° t^e^ tv dt 
is a normalizing constant. Up to a simple substitution, this is the well known 
gamma distribution. We observe that the distribution of random points with 
respect to 6 p ^ equals to the distribution of normalized vectors u/, i.e. 

6 p b{A)=¥\ K'-'-^n) A AC A;. (34) 

(iv) Of course, the same procedure might be considered also for other distributions. 
We leave this to future work. We also refer to the discussion on the recent 
work of Gribonval, Cevher, and Davies in the Introduction. 

Lemma 14. Let < p < oo, f3 > — 1 and n > 2. 
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(i) Let 1 < m < n. Then 

n 

n 

eh 



,,x. ,„ . _ / t , iln ,=i r(n(/3 + l)/p) 



°m-i%,d)-l. x *mdO p , P „ T{n{f3 + 1)/p + 1/p y 



1=1 



E n 



n 

/3 



.7' 

i=l 



*.T((J3 + l)/p) 
P 



Proof. The proof of the first part follows again by (|13p . this time used for the 
functions 

n n 

h{x) = x* m (j[xf)e-^--^ and f 2 (x) = (j[x?)e-*-"*. 

i=l i=l 

The proof of the second part is straightforward. □ 

It follows directly from ([9|), that T(s) tends to infinity, when s tends to zero. The 
following lemma quantifies this phenomenon. Although the statement seems to be 
well known, we were not able to find a reference and we therefore provide at least a 
sketch of the proof. 

Lemma 15. Let C ~ 0.577. . . denote the Euler constant. Then 

llm ^(l/n)V_„-c 



n 

Proof. It is enough to show, that 

lim n ■ log(r(l + 1/n)) = -C, 

n— >oo 

which (by using the l'Hospital rule) follows from 

f n °° s l l n e~ s log s ds 
n->oo J o °° s l l n e~ s ds 

But the numerator of this fraction is equal to r'(l + 1/n) and its denominator to 
r(l + 1/n). The whole fraction is therefore equal to ^(1 + 1/n) and ^(1 + 1/n) — > 
^(1) = — C as n tends to infinity, cf. [U Section 6.3.2, p. 258]. □ 

Next theorem shows, that if /3 = p/n— 1, then the measure 9 p p promotes sparsity 
and one may even consider limiting behavior of n growing to infinity. 

Theorem 16. Let < p < oo and let n > 2 and 1 < m < n be integers. Then 

p.°° /a \^n^ r ( n + 1 ) r(n/p + n - m + 1) 

^ r(n — m + 1) r(n/p + n + l) 
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and 



P'°° (ft \<r 2 r ( n + 1 ) \ T{n/p + n-m + l) 1 / e 1 V 

m - l[p ^ /n - 1> - p 'T(n-m+l)\ T{n/p + n+l) ml ' \T(l/n) J 

(36) 

where Cp and Cp are positive real numbers depending only on p. 
Furthermore, for every fixed m £ N, 

p - xw < ]hnwfo% l ™ 1 (dp P / n --i) < limsupo^~ 1 (^p P / n _i) < (37) 



where Cp and Cp are positive real numbers depending only on p. 
Proof. First observe, that n(/3 + l)/p = 1 for /3 = p/n — 1 and therefore 

r(nQ9 + l)/p) = 1 

r(n(/3 + i)/ P + i/p) r(i + i/p) 

depends only on p. Due to Lemma [14"1 we have to estimate 

n^""^^/ x* m 1 [[x^- 1 e-<--^d X . (38) 
f=i / ^+ f=i 

Let t = x* m and let us assume, that there is only one coordinate j = 1, . . . , n, such 
that Xj = t. Obviously, this assumption holds almost everywhere. Of course, we 
have n possibilities for j. Furthermore, m — 1 from the remaining n — 1 components 
of x are bigger than t and the remaining n — m components are smaller. This allows 
to rewrite (1381) as 



c p n 



n—1 
m—1 



o 

oo \ m—1 

u p/n-l e -uP du dt 



oo \ m—1 



Let us denote 

7 



/"OO /*W 

r(l/n)= / s^Vds and y(uj) = ^ ■ s l/n - l e- s ds. (39) 

Then y(u;) is a non-decreasing function of w, y(0) = and \im w ^ ) . 00 y{oj) = 1. We 
denote by w(y) its inverse function, i.e. 

y = ^~ l ■ / s^V^s, < y < 1. (40) 
JO 
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Using this notation, we obtain 



oT, ( Wi) = r(m) r ( ( :! 'i + !) jf "M'V-H - »)-'■»■ 



and 



where w(y) is given by ([30]) . 
5"iep i. Estimate from below 
The estimate 



s l/n ~ l e' s ds< / s 1 /™- 1 ^ = nw(y) 1/n 



72/ = 

implies together with Lemma [151 
with c independent of n. This gives finally 

C(W > • r , + • /* - y) m - l dy 



o 



4/p. r ( n+1 ) 



B(n/p + n — m + 1, m) 



r(m)T(n — m + 1) 

r(n + 1) T(n/p + n — m + l) 
r(n — m + 1) r(n/p + n + l) 



where we used the Beta function ()10p and the proof of ()35p is complete. 

S'iep 5. Estimate from above 
Let us first take y, such that 1 — e _1 /7 < y < 1. Then — ln(7(l — y)) > 1 and 



oo 



s l ' n - l e- s ds< / e- s ds = -/(l-y). 

ln( 7 (l-y)) y-hWi-j)) 

Hence, 

w(y)<-ln(7(l-y)), 1 - e _1 /7 < 1/ < 1- 
Finally, we observe, that 

/>oo 

f:y-> s l ' n - l e- s ds 

JCy n 

is a convex function on R + , /(0) =7 and 

/•oo 

f(l-e~ x /i) = / s l / n - l e- s ds 
JC{l-e-' l /i) n 
/•oo 

< / s 1/n - 1 e- s ds<e-\ 
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if we choose C so large, that C(l — e~ 1 /^() n > 1 for all n £ N. This is indeed 
possible, while a byproduct of Lemma [15] is also a relation lim n ^ 00 7/n = 1. Using 
the convexity of /, we obtain 

f(y)<l(l-y), 0<y<l-e-7 7 , 

which further leads to 

u(y)<Cy n , 0<y<l-e-7 7 . (43) 
We insert 02) and (USD into (ED and obtain 



o^CWi-O < tv ^ + ( cl/p/ i + > ( 44 ) 

' 1 (mjl (n — m + 1J L J 



where 



and 



l-e-V7 , 



'2 - 



l-e-V7 



The first integral may be estimated again using the Beta function, which gives 

I\ < B(n/p + n — m + 1, m). (45) 

We denote by k the uniquely defined integer, such that 1/p < k < 1/p + 1 holds, 
and estimate 



/ 2 < T I ln( 7 (l " y))\ 1/p (l ~ y) m - l dy < h, m := f ^ | ln( 7 y)| 

il-e-V7 JO 



"1 ;>e 1 /7 

Next, we use partial integration to estimate Ik m - We obtain 



m \ 7 J m 
Together with Jo,m = 1/m • {e~ l /^) m , this leads finally to 

(fc + 1)! fe- 1 ' 



m 



This, together with (|44j) and (|45|) finishes the proof of (|36j) . 

The proof of (|37|) then follows directly by Stirling's formula (|lip , □ 



Remark 5. (i) Let us take m = 0. Then the formula (|37[) describes an essen- 
tially different behavior compared to the normalized cone and surface mea- 
sure. Namely, the expected value of the largest coordinate of x 6 A™ with 
respect to PjP / n -i does not decay to zero with n growing to infinity. We shall 
demonstrate this effect also numerically in next section. 
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(ii) If m > 0, then (f37|) shows, that Om^iOpp/n-i) decays exponentially fast with 
m, as soon as n is large enough. That means, that for n large enough, the 
average vector of A™ exhibits a strong sparsity-like structure. Namely, its m-th 
largest component decays exponentially with m. 

(iii) We have chosen in (|32|) a different f3 for each n, namely f3 n = p/n — 1 > — 1. 
This was of course a crucial ingredient in the proof of Theorem [161 It is not 
difficult to modify the analysis of the proof of Theorem [TU] to the situation, 
when f3 > —1 is fixed for all n £ N. In this case we obtain again, that 
(up to logarithmic factors) (jQ ,oa {0 p ^) is equivalent to n _1 / p with constants of 
equivalence depending on p > and /3 > — 1. 

(iv) Last, but not least, we observe, that one may choose p = 1 or even p = 2 in 
Theorem 1161 and still obtains the exponential decay of coordinates as described 
by (|37|) . It seems, that there is no significant connection between sparsity of 
an average vector of x E A™ and the size of p > 0. 



5 Numerical experiments 
5.1 Cone measure 

We would like to demonstrate the most significant effects of the theory also by 
numerical experiments. We start with the case of the cone measure. The key role is 
played by (|14p . It may be interpreted in the following way. To generate a random 
point on A™ with respect to the normalized cone measure, it is enough to generate 
uj\, . . . ,oj n with respect to the density c p e~ tP ,t > and then calculate 

(^i,---,0Jn) A „ 

(E^^) 1/P 

This method is very practical, as the running time of this algorithm depends only 
linearly on n. 

Let us note, that the values of may be generated very easily. For example the 
package GNU Scientific Library [26] implements a random number generator with 
respect to the gamma distribution using the method described in the classical work of 
Knuth |31j . Using this package, we generated 10 8 random points x £ A™ for n = 100 
andp G {1/2, 1, 2} to approximate numerically the value of n 1 ' p - J A „ x* m d^ p {x). The 
result may be found in the Figure CD 



5.2 Tensor measures 

As pointed out in Remark HI point (iii), a random point on A^ with respect to 
may be generated in the following way. We generate aj[, . . . ,u' n with respect to the 
density c p ^t^ ] e~ t% \t > 0, where = / °° t^e~ tP dt is a normalizing constant and we 
consider the vector 



i/p fe 
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Also this may be easily done with the help of [26J. We generated again 10 8 random 
points x £ Ap with respect to PtP / n _i for n = 100 and p £ {1/2, 1, 2}. Then we used 
those points to numerically approximate the expression log 10 (f An x^d9 P:P / n _i). 




20 40 60 80 100 20 40 60 80 100 



(a) n 1/p ■ J x* m dy, p (x) (b) log 10 (/ A „ x* m d6 VlP /„_!) 

Fi gure 1: Approximations of n. 1 '^ • J^ n x^dix p {x) (left) and log^Q(J"^„ &*m,dQ pp i n _\) 

(right) for n = 100, p = l/2(o), p = 1(») and p = 2(x) based on sampling of 10 8 
random points. 
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